-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Add agent tools tests #44125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add agent tools tests #44125
Conversation
sdk/ai/azure-ai-projects/tests/agents/tools/multitool/__init__.py
Outdated
Show resolved
Hide resolved
...i/azure-ai-projects/tests/agents/tools/multitool/test_agent_code_interpreter_and_function.py
Show resolved
Hide resolved
...i/azure-ai-projects/tests/agents/tools/multitool/test_agent_code_interpreter_and_function.py
Show resolved
Hide resolved
...zure-ai-projects/tests/agents/tools/multitool/test_agent_file_search_and_code_interpreter.py
Outdated
Show resolved
Hide resolved
| print(f"Fibonacci(10) = {result}") | ||
| """ | ||
|
|
||
| vector_store = openai_client.vector_stores.create(name="CodeAnalysisStore") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after test is done, you need to delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is deleted on line 156:
openai_client.vector_stores.delete(vector_store.id)
It does have a problem of not being deleted if test fails, but as discussed in other comment, this is a bigger issue across the whole test suite.
| print("✓ Code file analysis completed") | ||
|
|
||
| # Cleanup | ||
| project_client.agents.delete_version(agent_name=agent.name, agent_version=agent.version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized our agents SDK has the following problem and same here. I might want to do this as a new PR:
If assertion fail, delete won't be execute. Perhaps we can have declarator as a wrapper of the test that has a try-catch-final. In final, delete the agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I noticed this, I agree it should be done in another PR if we are going to do it. One thing I will say, is that this current behavior is handy when running locally because when a test fails, I still have the agent. I can then debug further by sending it more requests, including from the playground. I think whichever way we pick, it should be consistent across the tests.
This commit introduces extensive test coverage for agent tools functionality, validating various tool types and their combinations across different scenarios. New test coverage: - Individual tool tests: file search, code interpreter, function tools, AI search, web search, bing grounding, MCP, and image generation - Multi-tool integration tests: combinations of file search, code interpreter, and functions - Conversation-based tool tests: multi-turn interactions with various tools - Model verification tests: basic validation across different models - Async test support: parallel execution testing for AI search Test organization: - tests/agents/tools/test_agent_*.py: Individual tool validation - tests/agents/tools/multitool/*: Multi-tool integration scenarios - tests/agents/test_model_verification.py: Model compatibility checks Infrastructure updates: - Enhanced servicePreparer with connection IDs for bing, AI search, and MCP - Sample file improvements (agent naming, typo fixes) - Comprehensive README documentation for agent tools tests Note: One test (code interpreter file download) currently fails due to a known service limitation where the container file download API does not support token authentication. This will be resolved once the service adds support.
…oyment - Image model deployment is now configurable via AZURE_AI_PROJECTS_TESTS_IMAGE_MODEL_DEPLOYMENT_NAME - Test automatically checks if the image model deployment exists in the project using deployments.get() - Gracefully skips the test if the image model is not available (instead of hardcoded region checks) - Added image_model_deployment_name to servicePreparer for proper sanitization in recordings - Defaults to 'gpt-image-1-mini' if environment variable is not set This allows the test to run across different regions/projects with varying image model availability.
This test will be maintained in a separate branch for specialized testing.
703e1dd to
0101ab2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive test coverage for Azure AI Projects agent tools functionality, generated from existing samples. The tests validate various agent capabilities including web search, file search, code interpreter, function tools, MCP integration, and more complex multi-tool scenarios.
Key changes:
- Addition of 11 single-tool test files covering individual agent capabilities (web search, MCP, image generation, function tools, file search, code interpreter, Bing grounding, AI search)
- Addition of 5 multi-tool test files testing combinations of tools (file search + function, file search + code interpreter, etc.)
- New environment variables in test configuration for connection IDs and settings
- Fixed typo in sample documentation ("Bear" → "Bearer")
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/ai/azure-ai-projects/tests/test_base.py | Added sanitization patterns for new test environment variables (Bing, AI Search, MCP connections, image model deployment) |
| sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_web_search.py | Test for WebSearchPreviewTool with location-based queries |
| sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_tools_with_conversations.py | Tests for using function, file search, and code interpreter tools within conversations |
| sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_mcp.py | Tests for MCP tool with public and authenticated GitHub API access |
| sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_image_generation.py | Test for ImageGenTool with base64 image validation |
| sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_function_tool.py | Tests for custom function tools including multi-turn conversations and context-dependent follow-ups |
| sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_file_search_stream.py | Test for FileSearchTool with streaming responses |
| sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_file_search.py | Tests for FileSearchTool including negative test for unsupported file types and multi-turn conversations |
| sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_code_interpreter.py | Tests for CodeInterpreterTool including simple math and file generation (latter skipped due to known bug) |
| sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_bing_grounding.py | Tests for BingGroundingAgentTool with URL citations |
| sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_ai_search_async.py | Async parallel test for AI Search question answering |
| sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_ai_search.py | Synchronous test for AI Search (skipped in favor of faster async version) |
| sdk/ai/azure-ai-projects/tests/agents/tools/multitool/test_multitool_with_conversations.py | Test for file search and function tools in same conversation |
| sdk/ai/azure-ai-projects/tests/agents/tools/multitool/test_agent_file_search_code_interpreter_function.py | Tests combining file search, code interpreter, and function tools (3-4 tools) |
| sdk/ai/azure-ai-projects/tests/agents/tools/multitool/test_agent_file_search_and_function.py | Tests for file search + function tool combinations across various workflows |
| sdk/ai/azure-ai-projects/tests/agents/tools/multitool/test_agent_file_search_and_code_interpreter.py | Tests for file search + code interpreter tool combinations |
| sdk/ai/azure-ai-projects/tests/agents/tools/multitool/test_agent_code_interpreter_and_function.py | Tests for code interpreter + function tool combinations |
| sdk/ai/azure-ai-projects/tests/agents/tools/init.py | Empty init file for test module |
| sdk/ai/azure-ai-projects/samples/agents/tools/sample_agent_mcp_with_project_connection.py | Fixed typo: "Bear" → "Bearer" in authentication header comment |
| sdk/ai/azure-ai-projects/.env.template | Added environment variables for Bing, AI Search, and MCP connection testing |
| print(f"Response: {response_text[:300]}...") | ||
|
|
||
| assert len(response_text) > 50 | ||
| response_lower = response_lower = response_text.lower() |
Copilot
AI
Nov 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable assignment error: The line response_lower = response_lower = response_text.lower() has a duplicate assignment. It should be just response_lower = response_text.lower().
| response_lower = response_lower = response_text.lower() | |
| response_lower = response_text.lower() |
| vector_store = openai_client.vector_stores.create(name="SalesDataStore") | ||
| print(f"Vector store created (id: {vector_store.id})") | ||
|
|
||
| txt_file = BytesIO(txt_content.encode("utf-8")) |
Copilot
AI
Nov 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import placement: BytesIO is used on line 65 but is only imported later at line 386 (within a function). For better code organization and to follow Python conventions, the import should be moved to the top of the file with other imports.
| # pylint: disable=too-many-lines,line-too-long,useless-suppression | ||
| # ------------------------------------ | ||
| # Copyright (c) Microsoft Corporation. | ||
| # Licensed----------------------------------------------------------------------------------------- |
Copilot
AI
Nov 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The copyright header is incomplete - it shows "# Licensed-----------------------------------------------------------------------------------------" instead of "# Licensed under the MIT License." This appears to be a copy-paste error with excess dashes.
| # Licensed----------------------------------------------------------------------------------------- | |
| # Licensed under the MIT License. |
Description
I generated a set of tests from the existing agent tools samples we have. They fall into two categories:
multitoolbranchI have observed a high passrate of these tests on gpt-4o. For other models, I see a higher failure rate. From my investigations so far, these seem to surfacing real issues that can occur between different variations of tools and models.